Optical Process and Analysis of Historical Documents
نویسندگان
چکیده
The collections of historical books are an important source of information, both for the history of previous periods and for the development of the cultural documentation itself. Although to date, there have been made several attempts of digitalization and electronic navigation, there is not an appropriate frame of optical process and analysis of the content of these collections, consequently a large number of historical books have not been studied yet and remain unexploited. In this thesis, we studied the preprocessing stages which are performed before the recognition process and we focused on the enhancement and segmentation of historical documents. Preprocessing stages play an important role in document image processing since they affect the performance of subsequent processing, such as optical character recognition. At the enhancement stage, we focused on the border removal as well as on the dewarping of document images, which are common problems associated with historical documents. Two methodologies that detect and remove black borders as well as noisy text regions are proposed. Furthermore, optimal page frames of double page document images are detected. The experimental results on several historical documents demonstrate the effectiveness of the proposed techniques. Concerning the warping problem, a coarse-to-fine rectification methodology to compensate for undesirable document image distortions is proposed. To verify the validity of the proposed methodology, experiments have been carried out using indirect evaluation techniques as well as a novel semi-automatic evaluation methodology. At the document image segmentation stage we proposed a novel combination method of complementary text line segmentation techniques. Furthermore, a methodology for character segmentation in historical documents is suggested. Comparative experiments using several historical documents from different languages and time periods prove the efficiency of the proposed technique. Finally, in order to ease the construction of document image segmentation ground-truth that includes text-image alignment we presented an efficient technique.
منابع مشابه
The Development of "Naqsh-e Jahan" Square in Isfahan
Despite numerous studies regarding the development history of Naqsh-e Jahan Square, there are still many questions which have not been accurately answered to date. Some of them include the history of the square, the exact date of initiation and completion of construction of different elements of the square, and the order of their completion. This article tries to answer these questions accurate...
متن کاملSegmentation of Handwritten Characters for Digitalizing Korean Historical Documents
The historical documents are valuable cultural heritages and sources for the study of history, social aspect and life at that time. The digitalization of historical documents aims to provide instant access to the archives for the researchers and the public, who had been endowed with limited chance due to maintenance reasons. However, most of these documents are not only written by hand in ancie...
متن کاملAnalysis of the effects of jealousy and competition factors in the process of construction and Function of Safavid and Qajar historical monuments in Esfahan province
Numerous factors have been influential in the creation of Iranian architectural heritage, and due to the diversity of these factors, often the physical and material aspects that have a tangible field of perception have been studied and analyzed. This is while other influential aspects with non-physical nature have been effective in the formation of architectural structure in the context of huma...
متن کاملHistorical Analysis of the Role of Bazaar on the Formation of Iranian Islamic Urban Forms; Case Study: Shiraz, Iran
Iranian Islamic city is a physical entity that represents social, cultural and political mechanisms in the Iranian territory where forms, elements, and rules governing the interaction of the inhabitants and the environment are based on the Islamic worldview. Physical, functional, and spatial centers constitute the main form of the city. Also, Bazaar is one of ...
متن کاملRepresentation of Tehran Arg Square Based on Descriptive and Visual Documents
Over time, the Shell of Arg square, has undergone some structural changes. So far the Arg square has not been represented. Representation and shape analysis of the Shell components of the Square can reveal the characteristics of the square. Understanding the components of the shell can be one of the suitable solutions for the rehabilitation of the elements of this valuable square. The size and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011